Fix flaky test: test_mkldnn.test_activation #12377 #12418

luobao-intel · 2018-08-31T04:03:29Z

Description

Fix the flaky test failure : test_mkldnn.test_activation. #12377
The problem locates in the finite difference method for gradient comparison. In this case, the large eps caused the wrong calculation and this pull request reduces the eps for more accurate calculation.
@pengzhao-intel

pengzhao-intel · 2018-08-31T04:51:34Z

Please enable the case which has been skipped in another PR.

pengzhao-intel · 2018-08-31T08:19:20Z

@marcoabreu @lebeg please help take a reveiw

lebeg · 2018-08-31T08:38:26Z

tests/python/mkl/test_mkldnn.py

@@ -292,7 +291,7 @@ def check_activation_training(stype):
            in_location = [mx.nd.array(data_tmp).tostype(stype)]

            test = mx.symbol.Activation(data, act_type="relu")
-            check_numeric_gradient(test, in_location, numeric_eps=1e-2, rtol=0.16, atol=1e-4)
+            check_numeric_gradient(test, in_location, numeric_eps=1e-6, rtol=0.16, atol=1e-4)


Hm, this change is lowering the epsilon by 4 magnitudes. If this test was flaky before due to computational precision wouldn't this change make it worse?

Actually, it will improve the precision of the reference results because we use the smaller eps for the finite difference method.
Previously, the flaky test is caused by the large eps which can't simulate the small difference.
The numerical calculation sometimes is a little tricky :)

This epsilon would have an impact on the baseline calculation instead of mkldnn calculation. And the smaller the epsilon is, the more accurate the baseline (gradient referring to theano) is. So this change won't make it worse.

@pengzhao-intel @luobao-intel thanks for your explanations!

marcoabreu · 2018-08-31T09:37:37Z

@szha @eric-haibin-lin

lupesko · 2018-09-04T04:34:17Z

Thanks for the fix @luobao-intel !
Copying in more folks for review: @anirudh2290 @azai91 @mseth10

…che#12418)" This reverts commit 445967e.

* test_activation_rec_eps * enable case

…12516) This reverts commit 445967e.

…12418)" (#12516)" This reverts commit 7ea0533.

* Revert "Removing the re-size for validation data, which breaking the validation accuracy of CIFAR training (#12362)" This reverts commit ceabcaa. * Revert "[MXNET-580] Add SN-GAN example (#12419)" This reverts commit 46a5cee. * Revert "Remove regression checks for website links (#12507)" This reverts commit 619bc3e. * Revert "Revert "Fix flaky test: test_mkldnn.test_activation #12377 (#12418)" (#12516)" This reverts commit 7ea0533. * Revert "further bump up tolerance for sparse dot (#12527)" This reverts commit 90599e1. * Revert "Fix broken URLs (#12508)" This reverts commit 3d83c89. * Revert "Temporarily disable flaky tests (#12520)" This reverts commit 35ca13c. * Revert "Add support for more req patterns for bilinear sampler backward (#12386)" This reverts commit 4ee866f. * Revert "Change the way NDArrayIter handle the last batch (#12285)" This reverts commit 597a637.

* test_activation_rec_eps * enable case

…che#12418)" (apache#12516) This reverts commit 445967e.

* Revert "Removing the re-size for validation data, which breaking the validation accuracy of CIFAR training (apache#12362)" This reverts commit ceabcaa. * Revert "[MXNET-580] Add SN-GAN example (apache#12419)" This reverts commit 46a5cee. * Revert "Remove regression checks for website links (apache#12507)" This reverts commit 619bc3e. * Revert "Revert "Fix flaky test: test_mkldnn.test_activation apache#12377 (apache#12418)" (apache#12516)" This reverts commit 7ea0533. * Revert "further bump up tolerance for sparse dot (apache#12527)" This reverts commit 90599e1. * Revert "Fix broken URLs (apache#12508)" This reverts commit 3d83c89. * Revert "Temporarily disable flaky tests (apache#12520)" This reverts commit 35ca13c. * Revert "Add support for more req patterns for bilinear sampler backward (apache#12386)" This reverts commit 4ee866f. * Revert "Change the way NDArrayIter handle the last batch (apache#12285)" This reverts commit 597a637.

test_activation_rec_eps

45411e2

enable case

217e9ec

lebeg reviewed Aug 31, 2018

View reviewed changes

lebeg approved these changes Aug 31, 2018

View reviewed changes

lupesko approved these changes Sep 4, 2018

View reviewed changes

eric-haibin-lin merged commit 445967e into apache:master Sep 8, 2018

lebeg added a commit to lebeg/incubator-mxnet that referenced this pull request Sep 11, 2018

Revert "Fix flaky test: test_mkldnn.test_activation apache#12377 (apa…

5b1b952

…che#12418)" This reverts commit 445967e.

aaronmarkham pushed a commit to aaronmarkham/incubator-mxnet that referenced this pull request Sep 11, 2018

Fix flaky test: test_mkldnn.test_activation apache#12377 (apache#12418)

b4a67e4

* test_activation_rec_eps * enable case

marcoabreu pushed a commit that referenced this pull request Sep 12, 2018

Revert "Fix flaky test: test_mkldnn.test_activation #12377 (#12418)" (#…

7ea0533

…12516) This reverts commit 445967e.

zhreshold added a commit that referenced this pull request Sep 12, 2018

Revert "Revert "Fix flaky test: test_mkldnn.test_activation #12377 (#…

e11bbf7

…12418)" (#12516)" This reverts commit 7ea0533.

lebeg mentioned this pull request Sep 14, 2018

fix test_activation by lowering threshold + validate eps for check_numeric_gradient #12560

Merged

6 tasks

anirudh2290 pushed a commit to anirudh2290/mxnet that referenced this pull request Sep 19, 2018

Fix flaky test: test_mkldnn.test_activation apache#12377 (apache#12418)

33e95c1

* test_activation_rec_eps * enable case

anirudh2290 pushed a commit to anirudh2290/mxnet that referenced this pull request Sep 19, 2018

Revert "Fix flaky test: test_mkldnn.test_activation apache#12377 (apa…

c0658a2

…che#12418)" (apache#12516) This reverts commit 445967e.

lebeg mentioned this pull request Oct 9, 2018

Flaky test: test_mkldnn.test_activation #12377

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flaky test: test_mkldnn.test_activation #12377 #12418

Fix flaky test: test_mkldnn.test_activation #12377 #12418

luobao-intel commented Aug 31, 2018

pengzhao-intel commented Aug 31, 2018

pengzhao-intel commented Aug 31, 2018

lebeg Aug 31, 2018

pengzhao-intel Aug 31, 2018

luobao-intel Aug 31, 2018

lebeg Aug 31, 2018

marcoabreu commented Aug 31, 2018

lupesko commented Sep 4, 2018

Fix flaky test: test_mkldnn.test_activation #12377 #12418

Fix flaky test: test_mkldnn.test_activation #12377 #12418

Conversation

luobao-intel commented Aug 31, 2018

Description

pengzhao-intel commented Aug 31, 2018

pengzhao-intel commented Aug 31, 2018

lebeg Aug 31, 2018

Choose a reason for hiding this comment

pengzhao-intel Aug 31, 2018

Choose a reason for hiding this comment

luobao-intel Aug 31, 2018

Choose a reason for hiding this comment

lebeg Aug 31, 2018

Choose a reason for hiding this comment

marcoabreu commented Aug 31, 2018

lupesko commented Sep 4, 2018